Reply to this post | See parent post | Go Back
View Post [edit]

Poster: hank_b Date: Mar 9, 2010 11:49pm

Forum: texts Subject: Re: PDF's with text layer

Unfortunately, we're not currently set up to extract the text layer from contributed PDFs.

Having significant numbers of PDFs with text layers uploaded is a relatively new thing - our processing pathways were all originally designed to work from images, not PDFs. We do now derive from contributed PDFs, but only through the makeshift approach of extracting the images from the PDF and processing them as though they came from scanning a book. That means the OCR results you see (via the "Full Text" link) come from running OCR on your images, so yes, it has the same (limited) level of accuracy as the OCR we produce for scanned books.

Extracting text directly from contributed PDFs is on the to-do list, but I'm afraid I can't give you any estimate of when it will happen.

Hank Bromley
software engineer
Internet Archive

Reply to this post
Reply [edit]

Poster: genet Date: Jan 23, 2011 12:14pm

Forum: texts Subject: Re: PDF's with text layer

Any news on this front? I am preparing more uploads with carefully proofread text layers, would sure be nice if they could be used!
Thanks,
-Gene

Reply to this post
Reply [edit]

Poster: hank_b Date: Jan 23, 2011 12:32pm

Forum: texts Subject: Re: PDF's with text layer

Gene-

Unfortunately, no, nothing new to report on using the text layer found within contributed PDFs.

-- Hank

Reply to this post
Reply [edit]

Poster: virtualverse0 Date: Apr 19, 2017 7:03am

Forum: texts Subject: Re: PDF's with text layer

any news about extracting text directly from contributed PDFs ? and about upload my own ocr xml file (made with adobe acrobat pro dc and manually fine revisioned) to substitute the abby ocr xml provided by archive.org?
This post was modified by virtualverse on 2017-04-19 14:03:06

Reply to this post
Reply [edit]

Poster: genet Date: Jan 23, 2011 12:36pm

Forum: texts Subject: Re: PDF's with text layer

Thanks again, Hank!
-Gene

Reply to this post
Reply [edit]

Poster: genet Date: Mar 15, 2010 10:14am

Forum: texts Subject: Re: PDF's with text layer

Thanks, Hank!
I just wanted to be sure that I wasn't doing anything
incorrectly.
-Gene

Internet Archive Audio

Featured

Top

Images

Featured

Top

Software

Featured

Top

Books

Featured

Top

Video

Featured

Top

Mobile Apps

Browser Extensions

Archive-It Subscription

Save Page Now

Reply to this post | See parent post | Go Back
View Post [edit]

Poster: hank_b Date: Mar 9, 2010 11:49pm

Forum: texts Subject: Re: PDF's with text layer

Reply to this post
Reply [edit]

Poster: genet Date: Jan 23, 2011 12:14pm

Forum: texts Subject: Re: PDF's with text layer

Reply to this post
Reply [edit]

Poster: hank_b Date: Jan 23, 2011 12:32pm

Forum: texts Subject: Re: PDF's with text layer

Reply to this post
Reply [edit]

Poster: virtualverse0 Date: Apr 19, 2017 7:03am

Forum: texts Subject: Re: PDF's with text layer

Reply to this post
Reply [edit]

Poster: genet Date: Jan 23, 2011 12:36pm

Forum: texts Subject: Re: PDF's with text layer

Reply to this post
Reply [edit]

Poster: genet Date: Mar 15, 2010 10:14am

Forum: texts Subject: Re: PDF's with text layer

Poster:	hank_b	Date:	Mar 9, 2010 11:49pm
Forum:	texts	Subject:	Re: PDF's with text layer

Poster:	genet	Date:	Jan 23, 2011 12:14pm
Forum:	texts	Subject:	Re: PDF's with text layer

Poster:	virtualverse0	Date:	Apr 19, 2017 7:03am
Forum:	texts	Subject:	Re: PDF's with text layer

Internet Archive Audio

Featured

Top

Images

Featured

Top

Software

Featured

Top

Books

Featured

Top

Video

Featured

Top

Mobile Apps

Browser Extensions

Archive-It Subscription

Save Page Now

Reply to this post | See parent post | Go Back View Post [edit]

Poster: hank_b Date: Mar 9, 2010 11:49pm Forum: texts Subject: Re: PDF's with text layer

Reply to this post Reply [edit]

Poster: genet Date: Jan 23, 2011 12:14pm Forum: texts Subject: Re: PDF's with text layer

Reply to this post Reply [edit]

Poster: hank_b Date: Jan 23, 2011 12:32pm Forum: texts Subject: Re: PDF's with text layer

Reply to this post Reply [edit]

Poster: virtualverse0 Date: Apr 19, 2017 7:03am Forum: texts Subject: Re: PDF's with text layer

Reply to this post Reply [edit]

Poster: genet Date: Jan 23, 2011 12:36pm Forum: texts Subject: Re: PDF's with text layer

Reply to this post Reply [edit]

Poster: genet Date: Mar 15, 2010 10:14am Forum: texts Subject: Re: PDF's with text layer

Reply to this post | See parent post | Go Back
View Post [edit]

Poster: hank_b Date: Mar 9, 2010 11:49pm

Forum: texts Subject: Re: PDF's with text layer

Reply to this post
Reply [edit]

Poster: genet Date: Jan 23, 2011 12:14pm

Forum: texts Subject: Re: PDF's with text layer

Reply to this post
Reply [edit]

Poster: hank_b Date: Jan 23, 2011 12:32pm

Forum: texts Subject: Re: PDF's with text layer

Reply to this post
Reply [edit]

Poster: virtualverse0 Date: Apr 19, 2017 7:03am

Forum: texts Subject: Re: PDF's with text layer

Reply to this post
Reply [edit]

Poster: genet Date: Jan 23, 2011 12:36pm

Forum: texts Subject: Re: PDF's with text layer

Reply to this post
Reply [edit]

Poster: genet Date: Mar 15, 2010 10:14am

Forum: texts Subject: Re: PDF's with text layer